---
output: 
    pdf_document: #bookdown::word_document2: 
    fig_caption: yes
    keep_tex: yes
    toc: no
    number_sections: no
    latex_engine: xelatex
    #pandoc_args: --lua-filter=multibib.lua
title: "Introducing the Trust in Government (TrustGov) Dataset: A New Resource for Cross-National Time-Series Trust Research"
#thanks: "Corresponding author: [yhcasstai@psu.edu](mailto:yhcasstai@psu.edu). Current version: `r format(Sys.time(), '%B %d, %Y')`.  Replication materials and complete revision history may be found at [https://github.com/Tyhcass/TGOV](https://github.com/Tyhcass/TGOV). "
#date: "`r format(Sys.time(), '%B %d, %Y')`"
editor_options: 
  markdown: 
    wrap: sentence
tables: true # enable longtable and booktabs
#citation_package: natbib
#citeproc: false
citeproc: true
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/sage-harvard.csl
fontsize: 12pt
indent: true
linestretch: 1.5 # double spacing using linestretch 1.5
bibliography: trust-gov.bib
#  text: trust-gov.bib
#  app: 
#biblio-style: apsr
citecolor: black
linkcolor: black
endnote: no
header-includes:
      - \usepackage{array}
      - \usepackage{caption}
      - \usepackage{graphicx}
      - \usepackage{siunitx}
      - \usepackage{colortbl}
      - \usepackage{multirow}
      - \usepackage{hhline}
      - \usepackage{calc}
      - \usepackage{tabularx}
      - \usepackage{threeparttable}
      - \usepackage{wrapfig}
      - \usepackage{fullpage}
      - \usepackage{lscape} #\usepackage{lscape} better for printing, page displayed vertically, content in landscape mode, \usepackage{pdflscape} better for screen, page displayed horizontally, content in landscape mode
      - \newcommand{\blandscape}{\begin{landscape}}
      - \newcommand{\elandscape}{\end{landscape}}
      - \usepackage{titlesec}
      - \titleformat*{\section}{\normalsize\bfseries}
      - \titleformat*{\subsection}{\normalsize\itshape}
      - \usepackage{titling} #use \maketitle repeatedly  
---

\pagenumbering{gobble}

# Authors {.unnumbered}

- Yuehong Cassandra Tai, corresponding author, ORCID: <https://orcid.org/0000-0001-7303-7443>, Assistant Research Professor, Center for Social Data Analytics, Pennsylvania State University, [yhcasstai\@psu.edu](mailto:yhcasstai@psu.edu){.email}

# Data Availability Statement
Replication materials are available on OSF: [https://osf.io/qr6ca/](https://osf.io/qr6ca/).
The complete revision history is available on GitHub: https://github.com/Tyhcass/TGOV.

# Conflict of Interest Disclosure
The author declares no competing interests.

# Acknowledgements 

\pagebreak


```{=tex}
\renewcommand{\baselinestretch}{1}
\selectfont
\maketitle
\renewcommand{\baselinestretch}{1.5}
\selectfont
```


```{=tex}
\begin{abstract}
Although political trust is a long-standing interdisciplinary topic, the lack of comparable cross-national time-series data has limited scholars’ ability to analyze its determinants and consequences and to generalize findings across countries and over time. To address this gap, this paper introduces the Trust in National Government (TrustGov) Dataset—a cross-national time-series resource covering 115 countries and territories from 1973 to 2020, harmonizing 1,545 country-year observations from 189 national and cross-national surveys using a Bayesian latent variable model. The dataset is validated through a series of convergent and construct tests. TrustGov supports qualitative and mixed-method research by guiding case selection and helping scholars probe mechanisms behind shifts in trust. It also enables quantitative analyses of trust's dynamic relationships with determinants such as inequality, election integrity,  and institutional performance, as well as outcomes, for example policy preferences and crisis resilience. The project will be updated regularly through periodic releases as new data become publicly available, supporting ongoing research on political trust.
\end{abstract}

Keywords: political trust, trust in government, latent variable model,  dataset
```

\pagebreak

\pagenumbering{arabic}
```{r setup, include=FALSE}
options(tinytex.verbose = TRUE)
knitr::opts_chunk$set(
  echo = FALSE,
  message = FALSE,
  warning = FALSE,
  cache = TRUE,
  dpi = 600,
  fig.width=7,
  fig.height = 2.5,
  plot = function(x, options)  {
    hook_plot_tex(x, options)
  }
)

if (!require(pacman))
    install.packages("pacman")
library(pacman)
#p_install(janitor, force = FALSE)
#remotes::install_github("fsolt/gesisdata")
#remotes::install_github("fsolt/DCPOtools")
##install.packages('wbstats')
#remotes::install_github("stan-dev/cmdstanr")

p_load(
  DCPOtools,
  DCPO,
  cmdstanr,
  tidyverse,
  here,
  maps,
  countrycode,
  wbstats, 
  tidybayes,
  scales,
  patchwork,
  ggthemes,
  rsdmx,
  readxl,
  osfr,
  kableExtra,
  bayesplot,
  stringr,
  kableExtra
) 

set.seed(313)

```

\pagenumbering{arabic}
```{r define_funs}
# define functions
round_up <- function(x) floor(x + 0.5)
validation_plot <- function(v_data_raw,
                            lab_x = .38, lab_y = 92,
                            theta_summary, theta_results) {
    
    # defaults per https://stackoverflow.com/a/49167744/2620381
    if ("theta_summary" %in% ls(envir = .GlobalEnv) & missing(theta_summary))
        theta_summary <- get("theta_summary", envir = .GlobalEnv)
    if ("theta_results" %in% ls(envir = .GlobalEnv) & missing(theta_results))
        theta_results <- get("theta_results", envir = .GlobalEnv)
  

    median_val <- Vectorize(function(x) median(1:x),
                            vectorize.args = "x")
    
    v_vars <- v_data_raw %>% 
      select(item0 = item,
             title = title) %>% 
      distinct() %>% 
      mutate(v_val = str_extract(item0, "\\d+") %>% 
               as.numeric() %>% 
               median_val(.) %>%
               { . + 0.6 } %>% 
               round_up())
    
    validation_summarized <- v_data_raw %>% 
      DCPOtools::format_dcpo(scale_q = v_vars$item0[[1]], # these arguments are required
                             scale_cp = 1) %>% # but they don't matter
      pluck("data") %>% 
      mutate(item0 = str_remove(item, " \\d or higher")) %>% 
      right_join(v_vars, by = "item0") %>%
      arrange(title) %>% 
      mutate(title = factor(title, 
                            levels = v_data_raw %>%
                              pull(title) %>%
                              unique())) %>% 
      filter(str_detect(item, paste(v_val, "or higher"))) %>%
      mutate(iso2c = countrycode::countrycode(country,
                                              origin = "country.name",
                                              destination = "iso2c",
                                              warn = FALSE),
             prop = y_r/n_r,
             se = sqrt((prop*(1-prop))/n),
             prop_90 = prop + qnorm(.9)*se,
             prop_10 = prop - qnorm(.9)*se) %>%
      inner_join(theta_summary %>% select(-kk, -tt), by = c("country", "year"))
    
    validation_cor <- theta_results %>%
      inner_join(validation_summarized %>%
                   select(country, year, title, prop, se),
                 by = c("country", "year")) %>% 
      rowwise() %>% 
      mutate(sim = rnorm(1, mean = prop, sd = se)) %>% 
      ungroup() %>% 
      select(title, theta, sim, draw) %>% 
      nest(data = c(theta, sim)) %>% 
      mutate(r = lapply(data, function(df) cor(df)[2,1]) %>% 
               unlist()) %>%
      select(-data) %>% 
      group_by(title) %>% 
      summarize(r = paste("R =", round(mean(r), 2)))

    if ({validation_summarized %>%
        pull(country) %>%
        unique() %>% 
        length()} > 1) {
      val_plot <- validation_summarized %>%
        ggplot(aes(x = mean,
                   y = prop * 100)) +
        geom_segment(aes(x = q10, xend = q90,
                         y = prop * 100, yend = prop * 100),
                     na.rm = TRUE,
                     alpha = .2) +
        geom_segment(aes(x = mean, xend = mean,
                         y = prop_90 * 100, yend = prop_10 * 100),
                     na.rm = TRUE,
                     alpha = .2) +
        geom_smooth(method = 'lm', formula = 'y ~ x', se = FALSE) +
        facet_wrap(~ title, ncol = 4) +
        geom_label(data = validation_cor, aes(x = lab_x,
                                              y = lab_y,
                                              label = r),
                   size = 2.5)
    } else {
      val_plot <- validation_summarized %>%
        ggplot(aes(x = year,
                   y = mean)) +
        geom_line() +
        geom_ribbon(aes(ymin = q10,
                        ymax = q90,
                        linetype = NA),
                    alpha = .2) +
        geom_point(aes(y = prop),
                   fill = "black",
                   shape = 21,
                   size = .5,
                   na.rm = TRUE) +
        geom_path(aes(y = prop),
                  linetype = 3,
                  na.rm = TRUE,
                  alpha = .7) +
        geom_segment(aes(x = year, xend = year,
                         y = prop_90, yend = prop_10),
                     na.rm = TRUE,
                     alpha = .2) +
        facet_wrap(~ title, ncol = 4) +
        geom_label(data = validation_cor, aes(x = lab_x,
                                              y = lab_y,
                                              label = r),
                   size = 2.5)
    }
    
    return(val_plot)
}

covered_share_of_spanned <- function(dcpo_input_raw) {
  n_cy <- dcpo_input_raw %>%
    distinct(country, year) %>% 
    nrow()
  
  spanned_cy <- dcpo_input_raw %>% 
    group_by(country) %>% 
    summarize(years = max(year) - min(year) + 1) %>% 
    summarize(n = sum(years)) %>% 
    pull(n)
  
  {(n_cy/spanned_cy) * 100}
}


```

```{r dcpo_input_raw, include=FALSE, eval=FALSE}
surveys_tb <- read_csv(here::here("rawdata",
                                  "trust_gov_survey.csv"),
                       col_types = "cccccc")

dcpo_input_raw_gov <- DCPOtools::dcpo_setup(vars = surveys_tb,
                                         datapath = here("..",
                                                         "data",
                                                         "dcpo_surveys"),
                                         file = here("data",
                                                     "dcpo_input_raw_gov.csv"))
```

```{r tgov_summary_stats}


surveys <-  read_csv(here::here("rawdata/trust_gov_survey.csv"),
                    col_types = "ccccccc") 

dcpo_input_raw_gov <- read_csv(here::here("data","dcpo_input_raw_gov.csv")) %>%
  filter(country != "Taiwan")

with_min_coverage <- function(x, min_cov) {
  if (!is.na(min_cov)) {
    country <- year <- years <- spanned <- coverage <- NULL
    x <- x %>%
      group_by(country) %>%
      mutate(years = length(unique(year)),
             spanned = length(min(year):max(year)),
             coverage = years/spanned) %>%
      filter(coverage >= min_cov) %>%
      select(-years, -spanned, -coverage) %>%
      ungroup()
  }
  return(x)
}
with_max_gap <- function(x, max_gap, edges = TRUE) {
    if (!is.na(max_gap)) {
        country <- yr_obs <- NULL
        c_yrs <- x %>% 
            group_by(country, year) %>% 
            summarize(year = first(year)) %>% 
            mutate(lead_span = ifelse(!is.na(lead(year)),
                                      lead(year) - year - 1,
                                      50),
                   lag_span = ifelse(!is.na(lag(year)),
                                     year - lag(year) - 1,
                                     50),
                   min_span = pmin(lead_span, lag_span),
                   max_span = pmax(lead_span, lag_span),
                   drop = min_span > max_gap & max_span == 50)
        
        x <- x %>% 
          left_join(c_yrs,
                    by = c("country", "year")) %>% 
          filter(!drop) %>% 
          select(-contains("span")) %>% 
          select(-drop)
    }
    return(x)
}

process_dcpo_input_raw <- function(dcpo_input_raw_df) {
  dcpo_input_raw_df %>% 
    with_min_yrs(3) %>% 
    with_min_cy(5) %>% 
    with_min_yrs(3) %>% # double-check after dropping <5 cy
    filter(year >= 1972 & n > 0) %>% 
    group_by(country) %>% 
    mutate(cc_rank = n()) %>% 
    ungroup() %>% 
    arrange(-cc_rank)
}


dcpo_input_raw_gov <- process_dcpo_input_raw(dcpo_input_raw_gov)



n_surveys <- surveys %>%
  distinct(survey) %>% 
  nrow()  #189

n_items <- dcpo_input_raw_gov %>%
  distinct(item) %>% 
  nrow() #10 items

n_countries <- dcpo_input_raw_gov %>%
  distinct(country) %>% 
  nrow()  #115

n_cy <- dcpo_input_raw_gov %>%
  distinct(country, year) %>% 
  nrow() %>% 
  scales::comma() #1,555 country year

n_years <- as.integer(summary(dcpo_input_raw_gov$year)[6]-summary(dcpo_input_raw_gov$year)[1]) # years

spanned_cy <- dcpo_input_raw_gov %>% 
  group_by(country) %>% 
  summarize(years = max(year) - min(year) + 1) %>% 
  summarize(n = sum(years)) %>% 
  pull(n) %>% 
  scales::comma() #2648

total_cy <- {n_countries * n_years} %>% 
  scales::comma()  #5405

year_range <- paste("from",
                    summary(dcpo_input_raw_gov$year)[1], #1973
                    "to",
                    summary(dcpo_input_raw_gov$year)[6]) #2020
n_cyi <- dcpo_input_raw_gov %>% 
  distinct(country, year, item) %>% 
  nrow() %>% 
  scales::comma() #2,126
back_to_numeric <- function(string_number) {
  string_number %>% 
    str_replace(",", "") %>% 
    as.numeric()
}

```



# Introduction
Trust is central to understanding political, social, and economic life.
It is shaped by factors such as inequality, corruption, election integrity, and government performance, and in turn underpins regime support, influences policy preferences, and conditions public responses during crises such as COVID-19 (for reviews of trust’s dynamics, determinants, and consequences, see @Levi2000; @Citrin2018; @Devine2021trust; @Kerr2024).

Although trust is a contested term,  a common view is that trust is relational—directed at specific targets, such as persons, groups, or institutions—and rarely unconditional [@Levi2000]. 
Two main forms are political trust and social trust. 
Political trust refers to "trust in a specific actor or institution" [@Devine2024], with proposed dimensions, for example trust in representative vs. implementing institutions, or trust in political institutions vs. authorities (see more discussion at @Tai2022role[, pp. 32-34]).
Social trust refers to "trust in human targets" [@Dinesen2020], including generalized social trust, particularized trust, and trust in specific groups.


Because political trust is essential to regime legitimacy, it has attracted extensive scholarly attention [@Citrin2018].
Yet, despite its prominence, testing trust-related theories comparatively has been constrained by the lack of comprehensive cross-national, time-series data.
Existing resources are often fragmented, limiting rigorous comparison of trust dynamics across political and regional contexts [@Kerr2024].
Much work focuses on a single country (often the U.S. or U.K.), and even comparative studies tend to concentrate on particular regions or democracies [@Devine2024].
As a result, scholars have limited leverage to examine how trust responds to governance performance, inequality, and corruption, and how it shapes regime support, voting behavior, and policy compliance.


To address this gap, I introduce TrustGov, a cross-national time-series dataset of public trust in national government covering 115 countries and territories from 1973 to 2020.
Scholars propose different dimensional frameworks for political trust, and a full discussion of these approaches is beyond the scope of this paper.
However, as with other comparative public attitude measures that lack multidimensionality [@Hu2025], empirical measures of political trust are often blunt.
Differentiating the object of trust helps capture its multidimensionality and supports future research [@Devine2024].
This dataset focuses on national government as a specific object of political trust because trust in national government has broad interdisciplinary relevance, spanning public health, behavioral science, economics, law, public policy, and environmental studies.

Using a Bayesian latent variable model developed by @Solt2020c, TrustGov synthesizes various survey sources into comparable estimates, drawn from `r n_surveys` surveys with `r n_cyi` country-year-item covering `r n_years` years.
In addition to mean estimates, I release full posterior draws so scholars can explicitly incorporate measurement uncertainty inherent in latent variable modeling.
With its temporal and cross-national coverage, TrustGov facilitates comparative research on the causes and consequences of trust in government, advancing work on democratic governance, electoral behavior, policy development, and related areas.

#  Data & Methods

```{r itemcountryplots, fig.height = 4, fig.cap = "Countries and Years with the Most Observations in the Source Data of Trust in Government \\label{tgov_item_country_plots}",cache=FALSE}


countries_plot <- dcpo_input_raw_gov %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country)) %>% 
  distinct(country, year, item) %>% 
  count(country) %>%
  arrange(desc(n)) %>% 
  head(15) %>% 
  ggplot(aes(forcats::fct_reorder(country, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95, size = 7),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Country-year-item \n observed") +
  ggtitle("Top countries \n by year–item observations")


cby_plot <- dcpo_input_raw_gov %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(15) %>% 
  ggplot(aes(forcats::fct_reorder(country, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95, size = 7),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Years\n observed") +
  ggtitle("Countries")

ybc_plot <- dcpo_input_raw_gov %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>%
  ggplot(aes(year, nn)) +
  geom_bar(stat = "identity") +
   scale_x_continuous(
    minor_breaks = seq(1972, 2022, by = 1),
    breaks = seq(1972, 2022, by = 2), limits = c(1972, 2022))  +
  theme_bw() +
  theme(axis.title.x = element_blank(),
         axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95,size = 5),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  xlab("Year") +
  ylab("Number of countries") +
  ggtitle("Number of countries \n observed by year")

world_map <- map_data("world") %>% 
  filter(!long > 180)

cby_map <- world_map %>% 
  distinct(region) %>% 
  mutate(country = countrycode::countrycode(region,
                                            "country.name",
                                            "country.name")) %>% 
  filter(!region=="Antarctica") %>% 
  left_join(dcpo_input_raw_gov %>% 
              count(country, year) %>% 
              count(country, name = "Years"),
            by = "country") %>% 
  mutate(Years = ifelse(is.na(Years), 0, Years)) %>% 
  ggplot(aes(fill = Years, map_id = region)) +
  geom_map(map = world_map,
           color = "white",
           linewidth = 0.06) +
  coord_map(projection = "mollweide", 
            ylim=c(-80, 90),
            xlim=c(-170, 170)) +
  theme_void() +
  scale_fill_distiller(na.value = "gray90", 
                       palette = "Blues",
                       direction = 1) +
  ggtitle("Years observed by country") +
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = c(.05,.1),
        legend.justification = c(0,0), 
        legend.direction = "vertical") +
  scale_y_continuous(expand=c(0,0)) +
scale_x_continuous(expand=c(0,0))

items_plot <- dcpo_input_raw_gov %>%
  distinct(country, year, item) %>%
  count(item) %>%
  arrange(desc(n)) %>% 
  head(15) %>% 
  ggplot(aes(forcats::fct_reorder(item, n, .desc = TRUE), n)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(angle = 90, vjust = .45, hjust = .95),
        axis.title.y = element_text(size = 9),
        plot.title = element_text(hjust = 0.5, size = 11)) +
  ylab("Country-years\n observed") +
  ggtitle("Items")



most_common_item <- dcpo_input_raw_gov %>% 
  count(item) %>% 
  arrange(-n) %>% 
  slice_head() %>% 
  pull(item) #

un_cy <- dcpo_input_raw_gov %>% filter(item == most_common_item ) %>% distinct(country, year) %>% nrow() #816

un_surveys <- dcpo_input_raw_gov %>% filter(item == most_common_item) %>% distinct(survey) %>% pull(survey) #75

#sum(stringr::str_detect(un_surveys, ",")) 

unique_survey <- length(unique(unlist(strsplit(un_surveys, ", ", fixed = TRUE)))) #47

most_common_item_cy <- dcpo_input_raw_gov %>% 
  filter(item == most_common_item) %>%
  distinct(country, year) %>%
  nrow()  

most_common_item_surveys <- dcpo_input_raw_gov %>%
  filter(item == most_common_item) %>%
  distinct(survey) %>%
  pull(survey) %>% 
  str_split(", ") %>% 
  unlist() %>% 
  unique() %>% 
  sort()

top_country_cyi <- dcpo_input_raw_gov %>% 
  distinct(country, year, item) %>%
  count(country) %>%
  arrange(-n) %>% 
  slice_head() %>%
 pull(country) 

top_country_cyi_obs <- dcpo_input_raw_gov %>%
  filter(country == top_country_cyi) %>%
  distinct(country, year, item) %>%
  nrow()  

others_gov <- dcpo_input_raw_gov %>%
  distinct(country, year, item) %>%
  count(country) %>%
  arrange(desc(n)) %>%
  slice(2:5) %>%
  pull(country) %>% 
  paste(collapse = ", ") %>% 
  str_replace(", (\\w+)$", ", and \\1") 

countries_cp <- dcpo_input_raw_gov %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year, item) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  pull(country)

countries_cbyp <- dcpo_input_raw_gov %>%
  mutate(country = if_else(stringr::str_detect(country, "United"),
                           stringr::str_replace(country, "((.).*) ((.).*)", "\\2.\\4."),
                           country),
         country = stringr::str_replace(country, "South", "S.")) %>% 
  distinct(country, year) %>%
  count(country) %>% 
  arrange(desc(n)) %>% 
  head(20) %>% 
  pull(country)

adding_gov <- setdiff(countries_cbyp, countries_cp) %>% 
  paste(collapse = ", ") %>% 
  str_replace(", (\\w*)$", ", and \\1")  #""

dropping_gov <- setdiff(countries_cp, countries_cbyp) %>% 
  paste(collapse = ", ") %>% 
  str_replace(", (\\w*)$", ", and \\1")  ##""

y_gov_peak_year <- dcpo_input_raw_gov %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>% 
  filter(nn == max(nn)) %>% 
  pull(year) #2018

y_gov_peak_nn <- dcpo_input_raw_gov %>%
  distinct(country, year) %>%
  count(year, name = "nn") %>% 
  filter(nn == max(nn)) %>% 
  pull(nn) 

data_gov_poorest <- dcpo_input_raw_gov %>%
  distinct(country, year, item) %>%
  count(country) %>%
  arrange(n) %>%
  filter(n == min(n)) %>%
  pull(country) %>% 
  paste(collapse = ", ") %>% 
  str_replace(", (\\w+)$", ", and \\1")  

wordify_numeral <- function(x) setNames(c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", " seventeen", "eighteen", "nineteen"), 1:19)[x]



n_gov_data_poorest <- {data_gov_poorest %>%
    str_split(",") %>% 
    first()} %>% 
  length() %>% 
  wordify_numeral()


cby_map + (countries_plot/ybc_plot) + plot_layout(widths = c(3.5, 1.5))

```

## Raw Data

Although many national and cross-national surveys have asked questions about trust in national government, comparative data at the aggregate level is sparse and fragmented.
This fragmentation is primarily due to limited coverage across countries and years, as well as inconsistencies in question wording and interpretation.

To construct a dynamic and comparable dataset, I systematically reviewed `r n_surveys` unique survey projects spanning `r n_countries` countries/territories over `r n_years` years.
I identified `r n_items` unique survey questions that captured public attitudes toward trust in national government.
To improve comparability and reduce uncertainty from sparse data, I excluded rarely asked survey items.

In the country-years span, among the `r spanned_cy` country-years, `r {back_to_numeric(n_cy)/back_to_numeric(spanned_cy) * 100} %>% round()`% have available information.
However, if we have observations for every year in each country surveyed, the total would be `r total_cy`. 
In fact, even collecting as much available national and cross-national data as possible, the current source data has `r n_cy` country-years, which is `r {back_to_numeric(n_cy)/back_to_numeric(total_cy) * 100} %>% round()`% of a complete set of total country-years.

The left panel of Figure \nobreakspace{}\ref{tgov_item_country_plots} maps the global distribution of observed country-years.
European and Latin American countries have longer time series due to the frequent fielding of Eurobarometer and AmericasBarometer surveys, as well as strong scholarly interest in democratic and emerging regimes.
In contrast, data from Asian and African countries is limited. 
The upper right panel further illustrates this geographical disparity: Germany leads with 54 country-year-item observations, followed by Spain and Finland. 
The lower right panel shows that few relevant survey items existed before 1990.
Country coverage peaked in `r y_gov_peak_year`, when respondents in `r y_gov_peak_nn` countries were asked about trust in government.
Overall, although such questions appeared as early as the 1970s, they were not surveyed regularly or broadly until the 1990s, and coverage has remained geographically uneven.

In the next section, I describe how sparse and non-comparable survey data are harmonized into comparable time-series trust estimates using a latent variable model.

## Measurements
Latent variable measurement assumes that the concept of interest is not directly observable but can be inferred from individuals' responses to relevant questions.
Recently, pioneering studies have developed latent variable models tailored to cross-national survey data [@Claassen2019; @Caughey2019; @McGann2019; @Kolczynska2024].
In this paper, I adopt the Dynamic Comparative Public Opinion (DCPO) model developed by @Solt2020c, which fits cross-national survey data better than leading alternatives [@Claassen2019;@Caughey2019] and handles sparsity without requiring dense coverage or auxiliary population characteristics [@McGann2019; @Kolczynska2024].

The DCPO model addresses two principal challenges in the source data: incomparability and sparsity.
To tackle incomparability across survey questions, the model includes two parameters.
The _difficulty_ parameter captures how much trust is required to endorse a response (e.g., "a great deal" vs. "somewhat"), and the _dispersion_ parameter indicates how sensitively responses reflect changes in the latent trust level.
A lower dispersion value implies that a small change in responses corresponds to a substantial shift in the latent trait.
Questions that are asked widely across countries and years anchor item parameters with greater precision, while rare items provide weaker information.
By estimating item difficulty and dispersion and leveraging common items as anchors, DCPO maps responses from different questions and surveys onto a single latent scale, yielding comparable country–year estimates.
Put differently, different questions are aligned onto the same scale, with widely asked questions stabilizing that scale.

To handle sparsity, DCPO adopts random-walk priors within countries, estimating missing latent values as the previous year’s trust level plus a small random shock.
If a year lacks survey data, the estimate is informed by the most recent available data point in that country. 
Uncertainty increases as the time gap between observed years grows.
As shown in Figure \nobreakspace{}\ref{tgov_item_country_plots}, the geographical and temporal distribution of raw data is uneven: European countries generally have dense time series, while African and Asian countries often have fewer observations.
Consequently, estimates from data-rich contexts rely more directly on survey responses and carry less uncertainty, whereas estimates from data-sparse contexts carry greater uncertainty. 
Although the model smooths estimates over time to provide continuous coverage, uncertainty remains.
This uncertainty propagates into downstream analyses, so results and inferences based on data-sparse contexts should be interpreted more cautiously.
For details on the DCPO model, see @Solt2020c [pp. 3–8]. 
Approaches for incorporating this uncertainty are discussed later.

Using the `DCPO` R package [@Solt2020a], I estimated country–year TrustGov scores.

```{r dcpo_input, eval=FALSE, include=FALSE, results=FALSE}

dcpo_input <- DCPOtools::format_dcpo(dcpo_input_raw_gov,
                                     scale_q = "PT_natgov_4",
                                     scale_cp = 2)

save(dcpo_input, file = here::here("data", "dcpo_input.rda"))
```

```{r dcpo, eval=FALSE, include=FALSE, results=FALSE}
iter <- 2000
 
dcpo <- paste0(.libPaths(), "/DCPO/stan/dcpo.stan")[1] |> 
  cmdstan_model()

dcpo_output <- dcpo$sample(
  data = dcpo_input[1:13], 
  max_treedepth = 14,
  adapt_delta = 0.99,
  step_size = 0.005,
  seed = 324, 
  chains = 4, 
  parallel_chains = 4,
  iter_warmup = iter/2,
  iter_sampling = iter/2,
  refresh = iter/50
)

results_path <- here::here(file.path("data", 
                                     iter, 
                                  {str_replace_all(Sys.time(),
                                                      "[- :]",
                                                   "") %>%
                                         str_replace("\\d{2}.\\d{6}$",
                                                     "")}))

dir.create(results_path, 
           showWarnings = FALSE, 
           recursive = TRUE)

dcpo_output$save_data_file(dir = results_path,
                           random = FALSE)
dcpo_output$save_output_files(dir = results_path,
                              random = FALSE)
```


```{r dcpo_results,eval=FALSE, include=FALSE, results=FALSE}
 if (!exists("results_path")) {
   latest <- "20250312060547.660"
   results_path <- here::here("data", "2000", latest)
   
   # Define OSF_PAT in .Renviron: https://docs.ropensci.org/osfr/articles/auth
   if (!file.exists(file.path(results_path, paste0("dcpo-", latest, "-1.csv")))) {
     dir.create(results_path, showWarnings = FALSE, recursive = TRUE)
     osf_retrieve_node("qr6ca") %>% 
       osf_ls_files() %>% 
       filter(str_detect(name, latest)) %>% 
       osf_download(path = results_path)
   }
}
 
dcpo_output <- as_cmdstan_fit(here::here(results_path,
                                   list.files(results_path,
                                              pattern="csv$")))

```

```{r dcpo_summary}
#load(file = here::here("data", "dcpo_input.rda"))

#dcpo_output <- readRDS(here::here("data","dcpo_output.rds")
#theta_summary <- DCPOtools::summarize_dcpo_results(dcpo_input,
#                                                   dcpo_output,
#                                                   "theta")
#

#save(theta_summary, file = here::here("data",
#                                      "theta_summary.rda"))

load(file = here::here("data", "theta_summary.rda"))



```

```{r theta_results}

#theta_results <- extract_dcpo_results(dcpo_input,
#                                      dcpo_output,
#                                      par = "theta")
load(file = here::here("data", "theta_results.rda"))
theta_results_list <- theta_results %>%
  nest(data = c(country, year,theta))  


res_cy <- nrow(theta_summary) %>% 
  scales::comma()

res_c <- theta_summary %>% 
  pull(country) %>% 
  unique() %>% 
  length()




```



# Results

```{r cs, fig.cap="TrustGov Scores, Most Recent Available Year (through 2020) \\label{cs_mry}", fig.height=10, fig.width=8}
n_panes <- 2
axis_text_size <- 10

p1_data <- theta_summary %>%
  group_by(country) %>%
  top_n(1, year) %>%
  ungroup() %>%
  arrange(mean) %>%
  transmute(country_year = paste0(country, " (", year, ")") %>% 
              str_replace("’", "'"),
            estimate = mean,
            conf.high = q90,
            conf.low = q10,
            pane = n_panes - (ntile(mean, n_panes) - 1),
            ranked = as.factor(ceiling(row_number()))) 

p_theta <- ggplot(p1_data,
                  aes(x = estimate, y = ranked)) +
  geom_segment(aes(x = conf.low, xend = conf.high,
                   y = ranked, yend = ranked),
               na.rm = TRUE,
               alpha = .4) +
  geom_point(fill = "black", shape = 21, size = .5, na.rm = TRUE) +
  theme_bw() + theme(legend.position="none",
                     axis.text.x  = element_text(size = axis_text_size,
                                                 angle = 90,
                                                 vjust = .45,
                                                 hjust = .95),
                     axis.text.y  = element_text(size = axis_text_size),
                     axis.title = element_blank(),
                     strip.background = element_blank(), 
                     strip.text = element_blank(),
                     panel.grid.major = element_line(size = .3),
                     panel.grid.minor = element_line(size = .15)) +
  scale_y_discrete(breaks = p1_data$ranked, labels=p1_data$country_year) +
  coord_cartesian(xlim=c(0, 1)) +
  facet_wrap(vars(pane), scales = "free", nrow = 1)


p_theta +
  plot_annotation(caption = "Note: Countries are ordered by their TrustGov score in the most recent available year. Gray whiskers represent 80% credible intervals.")

bottom5 <- p1_data %>% 
  arrange(ranked) %>% 
  slice(1:5) %>% 
  pull(country_year) %>% 
  str_replace(" \\(.*", "") %>% 
  knitr::combine_words()

```


Figure\nobreakspace{}\ref{cs_mry} displays the most recent available TrustGov score for each of the 115 countries and territories in the dataset. 
China and several Central Asian countries dominate the top positions, consistent with previous research documenting high levels of trust in these governments [@Schneider2017;@Paturyan2024;@Byaro2020].
Less corrupt countries like Denmark and Switzerland also rank highly, while `r bottom5` show the lowest recent scores. 
These lower-ranked countries faced serious challenges around the time of measurement, including corruption in Venezuela and Iraq, election-related violence in Brazil, and conflict or security threats in Tunisia and Libya.

There are well-known concerns about the authenticity of self-reported data in authoritarian contexts, especially for sensitive questions [@Blair2020]. 
However, empirical evidence is mixed.
Some studies find sensitivity biases are typically small [@Blair2020], while others suggest systematic misreporting under certain conditions [e.g., @Tannenberg2021], and still others find little evidence of bias in cases such as China [@Tang2016].
Both high- and low-trust countries in Figure\nobreakspace{}\ref{cs_mry} include regimes classified as partly free or not free by @fh1, suggesting that self-censorship or political wariness alone cannot account for why some authoritarian countries report high trust while others report low trust.
In addition, although respondents' understanding and reporting of political trust may differ across regime types, prior research has shown that latent variable models can measure trust in central political institutions comparably across contexts [@Schneider2017].
For transparency, I therefore report all estimates, while future research should continue to investigate the dynamics of trust in authoritarian contexts.

```{r ts, fig.cap="TrustGov Scores Over Time Across Selected Countries, 1973–2020 \\label{ts_plots}", fig.height=3.5,fig.width = 7}

countries <- c("Germany","United States","United Kingdom",
                "Greece", "Turkey","Australia",
               "China",  "India", "Philippines", 
                "Nigeria", "Argentina", "Mexico"
)

countries2 <- countries %>% 
  str_replace("United States", "U.S.") %>% 
  str_replace("United Kingdom", "U.K.")

c_res <- theta_summary %>% 
  filter(country %in% countries) %>%
  mutate(country = str_replace(country, "United States", "U.S.") %>% 
           str_replace("United Kingdom", "U.K.") %>% 
           factor(levels = countries2))

ggplot(data = c_res, aes(x = year, y = mean)) +
  theme_bw() +
  theme(legend.position = "none") +
  coord_cartesian(xlim = c(1972, 2025), ylim = c(0, 1)) +
  labs(x = NULL, y = "TrustGov score") +
  geom_ribbon(data = c_res, aes(ymin = q10, ymax = q90, linetype=NA), alpha = .25) +
  geom_line(data = c_res) +
  facet_wrap(~country, nrow = 2) +
  theme(axis.text.x  = element_text(size=7,
                                    angle = 90,
                                    vjust = .45,
                                    hjust = .95),
        strip.background = element_rect(fill = "white", colour = "white")) +
  patchwork::plot_annotation(caption = "Note: Lines represent TrustGov estimates; \ngray shading shows 80% credible intervals, reflecting the uncertainty around TrustGov estimates.")
```


Figure\nobreakspace{}\ref{ts_plots} illustrates how TrustGov scores evolve differently across 12 selected countries, with sharp increases in some contexts and long-term declines in others.
For example, trust in government has risen prominently in countries such as Germany, India, the Philippines, and Nigeria—likely reflecting stable governance under Merkel in Germany and Modi in India [@Oecd_trust2025; @Sardesai2023], populist administration in the Philippines [@Curato2017], and reduced violence levels in Nigeria [@Harding2024].
In contrast, trust has remained consistently high in China and relatively low in Australia.

TrustGov scores have declined steadily or dramatically in Greece, Mexico, Argentina, and the United States, primarily due to economic crises in Greece [@Ervasti2019], widespread corruption in Mexico [@Morris2010], financial instability and political dysfunction in Argentina [@cfr_argentina], and rising political polarization and partisanship in the United States [@Hetherington2018].

Some countries show greater volatility.
In the United Kingdom, shifts may be associated with Brexit, sovereignty debates, and immigration issues [@guardian_covid2025].
In Turkey, changes may reflect the personalization of political power and economic volatility [@Pewturks2024].

These visualizations show uncertainty in the estimates through the width of the 80% credible intervals and highlight the need for caution in interpretation.
It is also worth noting that the current dataset ends in 2020, coinciding with the onset of the COVID-19 pandemic. 
Post-2020 trends may deviate sharply from earlier trajectories, as some governments experienced surges of public trust during crisis response while others saw declines.
The next data release will incorporate post-2020 surveys to capture these dynamics.
Meanwhile, the cross-national and temporal variation already present in TrustGov scores invites in-depth analysis.

To assess whether TrustGov validly captures trust in national government across contexts and over time, I next report convergent and construct validation tests.


```{r internal_val_dat, include=FALSE}

process_dcpo_input_raw <- function(dcpo_input_raw_df) {
  dcpo_input_raw_df %>% 
    with_min_yrs(3) %>% 
    with_min_cy(5) %>% 
    with_min_yrs(3) %>% #?
    filter(year >= 1972 & n > 0) %>% 
    group_by(country) %>% 
    mutate(cc_rank = n()) %>% 
    ungroup() %>% 
    arrange(-cc_rank) %>% 
    mutate(item = str_remove_all(item, "_"))
}


dcpo_input_raw1 <- read_csv(here::here("data","dcpo_input_raw_gov.csv") ) %>% 
  process_dcpo_input_raw()


internal_tscs_dat <- dcpo_input_raw1 %>% 
  filter(item == "PTnatgov10") %>%  
  mutate(title = "All Country-Years",
         neg = FALSE) #Trust national government

cs_dat <- dcpo_input_raw1 %>%
  dplyr::select(year, survey, country) %>%
  distinct(year, survey, country) %>%
  group_by(year, survey) %>%
  summarise(n = n()) #gallup_vop2005 wellcome2018 pew2017 #Do you have confidence in each of the following, or not?  national gov

internal_cs_dat <- dcpo_input_raw1 %>% 
  filter(survey == "wellcome2018") %>%  
  mutate(title = "Wellcome (2018)",
         neg = FALSE) 
#Do you have confidence in each of the following, or not? 


ts_dat <- dcpo_input_raw1 %>%
  dplyr::select(year, item, survey, country) %>%
  distinct(year, item, survey,  country) %>%
  group_by(country,survey, item) %>%
  summarise(n = n()) 

internal_ts_dat <- dcpo_input_raw1 %>% 
  filter(survey == "allbus" & item == "PTnatgov7") %>%  
 # filter(country == "Germany") %>%  
  mutate(title = "Germany", #Germany
         neg = FALSE) #Trust national government TRUST: FEDERAL GOVERNMENT

# how much trust you place in FEDERAL GOVERNMENT. Please use this scale. 1 means you have absolutely no trust at all, 7 means you have a great deal of trust.

```


```{r internalval,fig.cap = "Convergent Validation Using Individual TrustGov Source Data Survey Items \\label{inter_val}"}

internal_tscs_plot <- validation_plot(internal_tscs_dat,
                                lab_x = .15,
                                lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score",
       y = "% expressing at least some trust")

internal_cs_plot <- validation_plot(internal_cs_dat,
                                lab_x = .15,
                                lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        # strip.text.x = element_text(size=5),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score",
       y = "% expressing at least some trust \n in national government")

internal_ts_plot <- validation_plot(internal_ts_dat,
                                    lab_x = 1990,
                                    lab_y = .95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(ylim = c(0,1), xlim = c(1984, 2018)) +
   scale_x_continuous(
    minor_breaks = seq(1984, 2018, by = 8),
    breaks = seq(1984, 2018, by = 8), limits = c(1984, 2018)) +
  labs(x = "Year",
       y = "Score") +
  annotate("text", x = 2005, y = .5, size = 2,
           label = 'TrustGov score') + #confidence in executive branch
  annotate("text", x = 2008, y = .2, size = 2,
         #  label = "Argentina")
           label = "Germany Allbus")


internal_tscs_plot + internal_cs_plot + internal_ts_plot  +
  patchwork::plot_annotation(caption = "Note: Gray whiskers and shading represent 80% credible intervals.")

```

Convergent validation tests whether a measure is empirically associated with alternative indicators of the same concept [@Adcock2001, 540].
I conducted "internal" convergent validation [see, e.g., @Caughey2019, 689; @Solt2020c, 10] by comparing TrustGov scores against individual source items used in estimation.

Figure\nobreakspace{}\ref{inter_val} presents three validation plots comparing TrustGov scores with the percentage of respondents expressing at least some trust, calculated using responses at or above the median of each item’s response scale.
The left panel shows a scatterplot of country-years in which TrustGov scores are plotted against responses to the Eurobarometer question: "Please tell me how much you personally trust each of the following institutions using a scale from 1 to 10, where [1] means ‘you do not trust the institution at all' and [10] means ‘you trust it completely’."
The strong correlation [R = 0.88] indicates that TrustGov scores effectively capture variations in trust across country-years.

The middle panel compares TrustGov scores with the responses to the question: "How much do you trust the national government in your country?" from the Wellcome Global Monitor Survey in 2018.
This item was asked in more countries than any other trust question in a single survey over the past decade, and the strong correlation [R = 0.94] demonstrates the broad applicability of TrustGov scores across diverse contexts.

Finally, the right panel compares TrustGov with the trend of the longest-running item in the Germany ALLBUS survey since 1984: "How much trust do you place in the federal government?"
TrustGov scores align with the observed trend [R = 0.93], capturing historical changes over time. 
In all tests, correlations are estimated accounting for measurement uncertainty.


```{r tgovextval2,  fig.cap = "Construct Validation Using Satisfaction, Perceived Corruption, and Approval Rating Data \\label{tgov_ev2}", fig.height=3.5,fig.width = 8}

load(here::here("data","extval2.rda"))

voter_corruption <-  voter_corruption %>% 
  left_join(theta_summary,
            by = c("country", "year")) %>% 
  filter(!is.na(mean))      


tgov_voter_merged <- theta_results_list %>%
  mutate(
    merged_data = purrr::map(
      data,
      ~ .x %>%
        right_join(
          voter_corruption,
          by = c("country", "year")
        ) %>%
        filter(!is.na(theta))
    )
  )


tgov_corrupt_cor <- tgov_voter_merged %>%
  mutate(
    cor = map_dbl(
      merged_data,
      ~ .x %>%
        mutate(theta = theta * 100) %>%
         filter(year == 2018) %>%
        with(cor(theta, cor_prop, use = "pairwise.complete.obs"))
    )
  )

tgovcor_cor <- tgov_corrupt_cor %>%
  summarise(mean_cor = mean(cor, na.rm = TRUE)) %>%
  pull(mean_cor) %>%
  round(2) %>%
  paste0("R = ", .)


voter_cor_label <- tibble(mean = 0.15, cor_prop = 98, label = tgovcor_cor)

voter_cor_plot <- ggplot(voter_corruption %>% filter (year == 2018),
                         aes(x = mean,
                             y = cor_prop)) +
  geom_segment(aes(x = q10, xend = q90,
                   y = cor_prop, yend = cor_prop),
               na.rm = TRUE,
               alpha = .2) +
  geom_smooth(method = 'lm', se = FALSE) +
  theme_bw() + theme(legend.position="none",
                    axis.text  = element_text(size=8),
                    axis.title = element_text(size=9),
                     plot.title = element_text(hjust = 0.5, size = 9)
                    ) +
   coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score", y = '% saying there is at least \n some widespread corruption (2018)') +
  ggtitle("Eurobarometer 88.2, CSES5, \n WVS7")  +
  geom_label(data = voter_cor_label, aes(label = label),size = 2.5)


sat_tgov <-  wvs7_sat  %>%
  left_join(theta_summary %>% 
              select(-kk, -tt),
            by = c("country", "year")) %>%
  filter(!is.na(mean))


tgov_sat_merged <- theta_results_list %>%
  mutate(
    merged_data = purrr::map(
      data,
      ~ .x %>%
        right_join(
          sat_tgov,
          by = c("country", "year")
        ) %>%
        filter(!is.na(theta))
    )
  )


tgov_sat_cor <- tgov_sat_merged %>%
  mutate(
    cor = map_dbl(
      merged_data,
      ~ .x %>%
        mutate(theta = theta * 100) %>%
        with(cor(theta, prop, use = "pairwise.complete.obs"))
    )
  )

tgovsat_cor <- tgov_sat_cor %>%
  summarise(mean_cor = mean(cor, na.rm = TRUE)) %>%
  pull(mean_cor) %>%
  round(2) %>%
  paste0("R = ", .)

tgov_sat_label <- tibble(mean = 0.1, prop = 95, label = tgovsat_cor)

tgov_sat_plot <- ggplot(sat_tgov,
                         aes(x = mean,
                             y = prop)) +
  geom_segment(aes(x = q10, xend = q90,
                   y = prop, yend = prop),
               na.rm = TRUE,
               alpha = .2) +
  geom_smooth(method = 'lm', se = FALSE) +
  theme_bw() + theme(legend.position="none",
                    axis.text  = element_text(size=8),
                  axis.title = element_text(size=9),
                   plot.title = element_text(hjust = 0.5, size = 9)
                  ) +
   coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score", y = "% satisfied with political system performance \n (WVS7)") +
  ggtitle("World Value Survey Wave 7")  +
  geom_label(data =tgov_sat_label, aes(label = label),size = 2.5)


ead2019 <- ead2019  %>% 
  left_join(theta_summary, by = c("country", "year")) %>% 
  filter(!is.na(mean)) %>% 
  mutate(iso3c = countrycode::countrycode(country,
                                          origin = "country.name",
                                          destination = "iso3c"))

tgov_ead_merged <- theta_results_list %>%
  mutate(
    merged_data = purrr::map(
      data,
      ~ .x %>%
        right_join(
          ead2019,
          by = c("country", "year")
        ) %>%
        filter(!is.na(theta))
    )
  )


tgov_ead_cor <- tgov_ead_merged %>%
  mutate(
    cor = map_dbl(
      merged_data,
      ~ .x %>%
        mutate(theta = theta * 100) %>%
         filter(country %in% oecd_countries) %>%
        filter(year == 2018) %>%
        with(cor(theta, Approval_Smoothed, use = "pairwise.complete.obs"))
    )
  )


tgovead_cor <- tgov_ead_cor %>%
  summarise(mean_cor = mean(cor, na.rm = TRUE)) %>%
  pull(mean_cor) %>%
  round(2) %>%
  paste0("R = ", .) 

tgov_ead_label <- tibble(mean = 0.1, Approval_Smoothed = 95, label = tgovead_cor)


tgov_ead_plot <- ggplot(ead2019  %>% 
         filter(country %in% oecd_countries) %>%
        filter(year == 2018),
                   aes(x = mean,
                       y = Approval_Smoothed)) +
  geom_segment(aes(x = q10, xend = q90,
                   y = Approval_Smoothed, yend = Approval_Smoothed),
               na.rm = TRUE,
               alpha = .2) +
  geom_smooth(method = 'lm', se = FALSE)  +
  theme_bw() + theme(legend.position="none",
                    axis.text  = element_text(size=8),
                  axis.title = element_text(size=9),
                   plot.title = element_text(hjust = 0.5, size = 9)
                  ) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score", y = "Executive approval estimates \n in OECD countries \n (Executive Approval Project) ") +
   ggtitle("Executive Approval Estimates, \n 2018")  +
  geom_label(data = tgov_ead_label, aes(label = label), size = 2.5)  


(tgov_sat_plot + voter_cor_plot + tgov_ead_plot ) + patchwork::plot_annotation(caption = "Note: Gray whiskers and shading represent 80% credible intervals.")




```


Construct validation assesses whether a measure is empirically correlated with indicators that theory suggests are causally related to that measure [@Adcock2001, 542]. 
I conducted construct validation using three such indicators: public satisfaction with political system performance, perceived corruption, and executive approval ratings.
Prior research shows that trust is a strong predictor of satisfaction with political system performance [@Hetherington2018] and approval ratings [@citrin2001political], and is negatively associated with perceived corruption [@Anderson2003].

The results are presented in Figure\nobreakspace{}\ref{tgov_ev2}.
The left panel shows a clear positive relationship [R = 0.71] between TrustGov scores and satisfaction with political system performance, measured as the share of respondents expressing at least some satisfaction in the WVS Wave 7. 
A similar positive correlation [R = 0.71] between TrustGov scores and executive approval ratings appears in the right panel.
The approval ratings are drawn from smoothed estimates for OECD countries in 2018 from the Executive Approval Project (version 2) [@carlin2019executive].
The center panel shows a negative relationship [R = -0.72] between TrustGov and perceived widespread corruption, as surveyed in Eurobarometer, the Comparative Study of Electoral Systems (CSES), and the WVS in 2018.
In all three tests, the correlation signs align with theoretical expectations.

In sum, the convergent and construct validation results provide strong evidence that TrustGov is a valid measure of trust in national government.

```{r tgovextval, fig.cap = "Additional Validation Test Using Data of Trust in Parliament and Trust in Public Administration \\label{tgov_ev1}"}

load(here::here("data","extval1.rda"))


ext_wvs_election_plot <- validation_plot(ext_wvs_election_dat,
                                      lab_x = .1,
                                      lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score",
       y = "% expressing some trust or more\nin election") +
   ggtitle("World Value Survey Wave 7")  



ext_eqls_parl_plot <- validation_plot(ext_eqls_parl_dat,
                                      lab_x = .1,
                                      lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score",
       y = "% expressing some trust or more\nin parliament")


ext_eb_pa_plot <- validation_plot(ext_eb_pa_dat,
                                      lab_x = .1,
                                      lab_y = 95) +
  theme_bw() +
  theme(legend.position="none",
        axis.text  = element_text(size=8),
        axis.title = element_text(size=9),
        plot.title = element_text(hjust = 0.5, size = 9),
        strip.background = element_blank()) +
  coord_cartesian(xlim = c(0,1), ylim = c(0,100)) +
  labs(x = "TrustGov score",
       y = "% expressing trust in \n public administration")

# For each of the following institutions, please tell me if you tend to trust it or tend not to trust it


#ext_wvs_election_plot 
 ext_eqls_parl_plot + ext_eb_pa_plot+ patchwork::plot_annotation(caption = "Note: Gray whiskers and shading represent 80% credible intervals.")

```

In addition to the tests above, I examined survey items that were not used to estimate TrustGov but are empirically closely related to trust in government:  trust in parliament and trust in public administration. 
Although the dimensional structure of political trust is debated, empirically, trust in government typically moves closely with trust in parliament, and the two are often grouped as indicators of political trust [@Dellmuth2024; @Van2024]. 
Likewise, although scholars distinguish trust in government and trust in public administration, empirical work finds a strong positive association between them [see more discussion at @Camoes2019]. 

In Figure\nobreakspace{}\ref{tgov_ev1}, TrustGov scores are compared against public confidence in parliament from the European Quality of Life Survey (EQLS) in the left plot and against the percentage of respondents who expressed trust in public administration in Eurobarometer in the right panel.

Across both tests, TrustGov is positively correlated with these related forms of institutional trust, with a stronger correlation for trust in parliament [R = 0.82], and a moderate correlation for trust in public administration [R = 0.74].

#  Discussion & Conclusion
Although political trust is a long-standing interdisciplinary topic, much of our understanding still comes from single countries or regions with rich longitudinal data that may not generalize elsewhere, or from cross-sectional snapshots that cannot capture dynamic changes over time [@Kolczynska2024].
The TrustGov dataset helps address this gap by providing comparable cross-national time-series measures of trust in national government.
It supports research on the determinants and consequences of trust across contexts and over time, for example, how trust shapes attitudes toward climate change and public health policies, and how it interacts with polarization, personalist leaders, and crisis governance.

TrustGov also supports qualitative and mixed-methods research. 
Researchers can use TrustGov trends to identify countries or periods with sharp shifts in trust and explore them further through case studies.
Identified turning points can be examined with process tracing or interviews to probe mechanisms behind observed shifts.
The dataset also facilitates classic comparative strategies, such as most similar systems design and most different systems design. 
Researchers can select cases with similar contexts but divergent trust trajectories, or cases that differ in most respects yet exhibit similar trust patterns.
In this way, TrustGov connects large-scale patterns with fine-grained qualitative accounts and strengthens contextual explanations of how trust develops in specific settings.


TrustGov offers a valuable resource for studying trust in government, but it has limitations.
First, as an aggregate-level measure, it may obscure subnational variation, polarization in trust, or case-specific dynamics that require country expertise and qualitative or mixed methods to illuminate.
Second, the smoothing approach introduces measurement uncertainty, and ignoring uncertainty risks distorting inferences [@Tai2024].
To facilitate appropriate use, I provide full posterior draws so researchers can incorporate uncertainty directly into their downstream analysis via different approaches [e.g., @Caughey2018; @Tai2024; @Woo2025]. 
Finally, the current release ends in 2020.
The COVID-19 pandemic might have reshaped trust in many countries.
Researchers should be mindful of this temporal boundary when drawing inferences.

The TrustGov project will continue to be updated and improved.
The next planned release will incorporate surveys conducted after 2020, including post-pandemic waves.
Future releases will expand the dataset on a rolling basis as new cross-national surveys become publicly available.

# References {.unlisted .unnumbered}

::: {#refs-text}
:::


\pagebreak

